Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Stud Health Technol Inform ; 302: 819-820, 2023 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-37203504

RESUMO

To classify sentences in cardiovascular German doctor's letters into eleven section categories, we used pattern-exploiting training, a prompt-based method for text classification in few-shot learning scenarios (20, 50 and 100 instances per class) using language models with various pre-training approaches evaluated on CARDIO:DE, a freely available German clinical routine corpus. Prompting improves results by 5-28% accuracy compared to traditional methods, reducing manual annotation efforts and computational costs in a clinical setting.


Assuntos
Idioma , Aprendizado de Máquina , Processamento de Linguagem Natural , Aprendizagem
2.
Sci Data ; 10(1): 207, 2023 04 14.
Artigo em Inglês | MEDLINE | ID: mdl-37059736

RESUMO

We present CARDIO:DE, the first freely available and distributable large German clinical corpus from the cardiovascular domain. CARDIO:DE encompasses 500 clinical routine German doctor's letters from Heidelberg University Hospital, which were manually annotated. Our prospective study design complies well with current data protection regulations and allows us to keep the original structure of clinical documents consistent. In order to ease access to our corpus, we manually de-identified all letters. To enable various information extraction tasks the temporal information in the documents was preserved. We added two high-quality manual annotation layers to CARDIO:DE, (1) medication information and (2) CDA-compliant section classes. To the best of our knowledge, CARDIO:DE is the first freely available and distributable German clinical corpus in the cardiovascular domain. In summary, our corpus offers unique opportunities for collaborative and reproducible research on natural language processing models for German clinical texts.

3.
Digit Health ; 7: 20552076211057662, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34868618

RESUMO

OBJECTIVE: A vast amount of medical data is still stored in unstructured text documents. We present an automated method of information extraction from German unstructured clinical routine data from the cardiology domain enabling their usage in state-of-the-art data-driven deep learning projects. METHODS: We evaluated pre-trained language models to extract a set of 12 cardiovascular concepts in German discharge letters. We compared three bidirectional encoder representations from transformers pre-trained on different corpora and fine-tuned them on the task of cardiovascular concept extraction using 204 discharge letters manually annotated by cardiologists at the University Hospital Heidelberg. We compared our results with traditional machine learning methods based on a long short-term memory network and a conditional random field. RESULTS: Our best performing model, based on publicly available German pre-trained bidirectional encoder representations from the transformer model, achieved a token-wise micro-average F1-score of 86% and outperformed the baseline by at least 6%. Moreover, this approach achieved the best trade-off between precision (positive predictive value) and recall (sensitivity). CONCLUSION: Our results show the applicability of state-of-the-art deep learning methods using pre-trained language models for the task of cardiovascular concept extraction using limited training data. This minimizes annotation efforts, which are currently the bottleneck of any application of data-driven deep learning projects in the clinical domain for German and many other European languages.

4.
Stud Health Technol Inform ; 278: 187-194, 2021 May 24.
Artigo em Inglês | MEDLINE | ID: mdl-34042893

RESUMO

The HiGHmed consortium aims to create a shared information governance framework to integrate clinical routine data. One challenge is the replacement of unstructured reporting (e.g. doctoral letters) with structured reporting in clinical routine. The Heidelberg cardiology department evaluates dynamic PDF forms for structured data reporting of heart failure (HF) patients. In this use case, we aim to identify potential caveats or shortcomings in data processing at an early stage. We employed data mining strategies to detect patterns related to incomplete or false data, which we found to be present among all data types. We then discuss the characteristics of the baseline patient cohort in Heidelberg to find out about specific peculiarities and potential biases, which may be site-specific. Briefly, our patient population is predominantly male (67%), NYHA I & II are the most common severity classes, NYHA IV is missing entirely. Most patients have a dilated cardiomyopathy (DCM) or coronary heart disease (CHD) diagnosed as their cause of HF. Finally, we also analyzed how comorbidities and risk factors relate to specific disease entities of heart failure patients. Family anamnesis was more frequent among cardiomyopathy patients than among CHD patients, who show a more dominating presence of dyslipidemia instead. Generally, the most dominant risk factor was arterial hypertension, while at the other end of the scale alcoholism appears to be underreported.


Assuntos
Cardiologia , Insuficiência Cardíaca , Estudos de Coortes , Insuficiência Cardíaca/epidemiologia , Humanos , Masculino , Fatores de Risco
5.
Stud Health Technol Inform ; 267: 101-109, 2019 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-31483261

RESUMO

One of the major obstacles for research on German medical reports is the lack of de-identified medical corpora. Previous de-identification tasks focused on non-German medical texts, which raised the demand for an in-depth evaluation of de-identification methods on German medical texts. Because of remarkable advancements in natural language processing using supervised machine learning methods on limited training data, we evaluated them for the first time on German medical reports using our annotated data set consisting of 113 medical reports from the cardiology domain. We applied state-of-the-art deep learning methods using pre-trained models as input to a bidirectional LSTM network and well-established conditional random fields for de-identification of German medical reports. We performed an extensive evaluation for de-identification and multiclass named entity recognition. Using rule based and out of domain machine learning methods as a baseline, the conditional random field improved F2-score from 70 to 93% for de-identification, the neural approach reached 96% in F2-score while keeping balanced precision and recall rates. These results show, that state-of-the-art machine learning methods can play a crucial role in de-identification of German medical reports.


Assuntos
Anonimização de Dados , Aprendizado Profundo , Registros Eletrônicos de Saúde , Aprendizado de Máquina , Processamento de Linguagem Natural
6.
Stud Health Technol Inform ; 253: 165-169, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30147065

RESUMO

Medical texts are a vast resource for medical and computational research. In contrast to newswire or wikipedia texts medical texts need to be de-identified before making them accessible to a wider NLP research community. We created a prototype for German medical text de-identification and named entity recognition using a three-step approach. First, we used well known rule-based models based on regular expressions and gazetteers, second we used a spelling variant detector based on Levenshtein distance, exploiting the fact that the medical texts contain semi-structured headers including sensible personal data, and third we trained a named entity recognition model on out of domain data to add statistical capabilities to our prototype. Using a baseline based on regular expressions and gazetteers we could improve F2-score from 78% to 85% for de-identification. Our prototype is a first step for further research on German medical text de-identification and could show that using spelling variant detection and out of domain trained statistical models can improve de-identification performance significantly.


Assuntos
Anonimização de Dados , Registros Eletrônicos de Saúde , Admissão do Paciente , Alemanha , Processamento de Linguagem Natural
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...